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Big Data on electronic records of social interactions allow approaching human behaviour and 
sociality from a quantitative point of view with unforeseen statistical power. Mobile telephone 
Call Detail Records (CDRs), automatically collected by telecom operators for billing purposes, 
have proven especially fruitful for understanding one-to-one communication patterns as well as 
the dynamics of social networks that are reflected in such patterns. We present an overview of 
empirical results on the multi-scale dynamics of social dynamics and networks inferred from mobile 
telephone calls. We begin with the shortest timescales and fastest dynamics, such as burstiness of 
call sequences between individuals, and ’’zoom out” towards longer temporal and larger structural 
scales, from temporal motifs formed by correlated calls between multiple individuals to long-term 
dynamics of social groups. We conclude this overview with a future outlook. 


I. INTRODUCTION 

Electronic records have revolutionised studies of hu¬ 
man social behaviour. Instead of having to rely on field 
observations or costly questionnaire-based surveys, to¬ 
day’s social scientists can follow social interactions be¬ 
tween millions of individuals with the help of email and 
social media logs as well as CDRs (Call Detail Records) 
extracted from billing systems of mobile telephone oper¬ 
ators. In addition to social scientists, this data-driven 
movement has attracted large numbers of physicists, in¬ 
terested in a variety of topics such as collective behaviour 
and emergent network structures. The term computa¬ 
tional social science has been coined to describe this new 
field of inquiry [T] . 

Data sets on mobile telephone calls have certain ad¬ 
vantages over other sources for studying social behaviour. 
Eirst, mobile telephones are ubiquitous and used by all 
age groups and in all social strata, whereas the user base 
of, say, Twitter cannot yet be considered as representa¬ 
tive of the general population. Second, a phone call needs 
to be picked up before its details are recorded as CDRs 
by the operator (caller, callee, time, duration). Hence, 
CDRs are records of verified, time-stamped one-to-one 
communication. This greatly facilitates constructing so¬ 
cial networks from the data, and especially allows for 
temporal analysis of communication patterns, to the con¬ 
trary of e.g. emails where recipient lists may be long and 
where there is no guarantee when (or if!) an email has 
been actually read. 

Because of the above, mobile phone call records have 
been used in numerous studies on diverse topics [2]: so¬ 
cial network structure (e.g. 0 0), geography of social 
relationships (e.g. 00 ), disaster response (e.g. [7]), eco¬ 
nomic development (e.g. 0 ), and human mobility pat¬ 
terns (e.g. M), to name a few. In the earliest investiga¬ 
tions, it was typical to aggregate calls between individu¬ 
als over time and treat the resulting networks as static [3] , 
or to consider slow dynamical processes such as dynamics 
of social groups being formed and merged m- Recently, 


however, there has been increased interest in dynamics 
on multiple time scales - time stamps of individual calls, 
statistics of inter-call times, and their network-level con¬ 
sequences have become focal topics (e.g. prHISj ). This is 
both because of the rich dynamics observed in empirical 
data, and because of the added level of detail for under¬ 
standing human behavioural patterns. At the same time, 
there has been a general increase of interest in temporal 
networks, networks that consist of nodes that are con¬ 
nected by events or contacts only at specified times m- 

In this paper, we attempt to provide a brief overview of 
what is known to go on in mobile telephone communica¬ 
tion and related social networks at different time scales, 
from short to long. This range of time scales is also re¬ 
lated to structural scales, as illustrated in Eigure[2 At 
the shortest time scales the focus is on the timings of 
individual calls and their correlations, and the relevant 
structural units are nodes and ties. Moving on to dy¬ 
namics on longer time scales, the focus gradually shifts 
to sets of ties, such as egocentric networks - sets of ties 
surrounding an individual - and social groups and com¬ 
munities. Einally, there is dynamics at the level of entire 
networks. A future outlook - where the field is heading, 
and where should it be heading - is given at the end. 

II. HOW TO CONSTRUCT SOCIAL 
NETWORKS FROM CDRS 

Before we address the network dynamics that are at 
the focus of this article, let us briefly explain how social 
networks are constructed from CDRs. 

Typically, the source data consists of entries containing 
at least the following items: caller id, callee id, time of 
event, duration of event (if any), event type (e.g. call, 
text message or multimedia message). These entries span 
some range of time, e.g. a month or six months. Often, 
the data has been filtered to only contain ids of customers 
of the source operator, and perhaps only those of private 
subscribers (non-company users). The ids are typically 
hashed versions of the original phone numbers - surrogate 
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FIG. 1. An overview of temporal and structural scales in mobile call networks, from activity on short time scales at the level 
of ties and nodes to slower dynamics at mesoscopic and network-wide scales. 


keys - generated for privacy reasons at or close to the 
data source. 

An unweighted, static social network can then be con¬ 
structed simply by considering the ids as nodes, and con¬ 
necting two nodes by links if there are call or message 
events between them in the data. Here, some filtering is 
usually applied, e.g. by requiring that there are one or 
several calls from i to j and vice versa for an i — j link to 
exist. One may also consider link weights^ that is, social 
tie strengths, computed either as total numbers of calls 
or total call duration between two individuals [3]. Obvi¬ 
ously, for such static aggregated networks, the time span 
covered by the data has an effect on the outcome m- 
The simplest way of constructing dynamic networks is 
then to split the data into consecutive time windows and 
apply the above procedure to each window, yielding a 
discrete time series of time-dependent links that may be 
weighted. For the most fine-grained dynamics, the con¬ 
cept of links is practically discarded as links are only 


considered a substrate for communication events. The 
events themselves form temporal networks, where callers 
and recipients are linked by a time-stamped contact only 
at those time points when there is a call (or message) in 
the CDR data. This is the case when activity patterns 
are studied at nodal or tie level. 


III. ACTIVITY PATTERNS AT THE LEVEL OF 
NODES AND TIES 

A. Human communication is bursty 

Let us now begin our journey from the smallest towards 
the largest by looking at the very atoms of communica¬ 
tion, the individual communication events. Since we are 
interested in the time domain, it is then natural to focus 
on the timings of events: what can we say about their 
properties and statistics? It has become apparent in the 
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FIG. 2. a) Timeline of all outgoing calls of one individual for 
one month, b) timeline of calls from the same individual to 
his/her three top acquaintances. In network terms, a) rep¬ 
resents the timeline of a node, and b) the timelines of links 
(social ties). Data from [19], figure after [^) 



FIG. 3. Probability density functions of scaled inter-call times 
on edges, i.e. between pairs of individuals. Because of the 
broad distribution of call activity levels of links, links have 
been grouped according to their number of calls. Then, for 
each group, the PDF of inter-event times At/At has been 
calculated separately, where At is the average inter-call time 
for the group. 


recent years that human activity in general is rarely uni¬ 
formly randomly distributed in time. Instead, human 
activity patterns are commonly bursty m - there are 
rapid bursts of successive events that are separated by 
longer periods of inactivity. Mobile telephone calls are 
not different: time series of calls are typically bursty, 
and accordingly, the distribution of times between calls 
is heavy-tailed [niiiaiiTiiTH]. Fig. [^displays an example 
of this alternation between bursts of calls and periods of 
no communication, for one individual and three of his/her 
social ties. All displayed time series are clearly bursty, 
with a high level of variance in times between successive 
calls. 

The level of burstiness in a time series can be measured 
with a single quantity B = [aAt — At) / [aAt + At), 
where a At is the variance and At the mean of inter-event 
times At [21]. However, it pays off to inspect the statis¬ 
tics of inter-call times in more detail. At first, it would 


seem natural to look for burstiness in the statistics of 
call timings by simply inspecting the probability density 
function (PDF), P(At), of all times At between succes¬ 
sive calls (either of an individual, or associated with one 
social tie). However, the result would be difficult to inter¬ 
pret, as it would arise from a mixture of inhomogeneities: 
it is known that the general activity levels of individu¬ 
als and the number of calls on each of their links are 
also broadly distributed (see, e.g., [3l[T9l[22]). Because 
of this, the typical way of characterising the statistics of 
inter-call times, introduced in m, is to first group ei¬ 
ther the individuals or their ties according to their total 
number of calls. Then, one can compute separately for 
each group a scaled version of the inter-call time PDF, 
P(At/At), where At is the average inter-call time com¬ 
puted for the group in question. It has been observed that 
this results in data collapse, where the scaled PDFs for 
different groups closely match, both for inter-call times of 
individuals m and inter-call times of ties, i.e. between 
pairs of individuals [HUS]. 

Figure displays the scaled inter-call time distribu¬ 
tions for ties, computed for a subset of the data used 
in m- There are three regions of interest. First, for low 
At/At, there is no data collapse, indicating the existence 
of a time scale that does not depend on average inter-call 
times. This time scale has to do with repeated and for¬ 
warded calls, and we will return to this in Sec. |HIB] The 
non-scaling region is followed by a power-law decay of 
inter-call times, indicating the presence of burstiness in 
the data. Finally, the PDF drops steeply; this can be 
associated with the effect of a finite observation period. 
See [23] for a discussion on estimating true inter-event 
time distributions from finite observation periods. 

As seen above, the timings of calls are bursty both for 
individuals (nodes) and their social ties (links). Which 
elements, then, are the drivers of burstiness? Is link 
burstiness ’’inherited” from the burstiness of nodes, or 
is node-level burstiness merely a consequence of the links 
being bursty? In m, Karsai et al. argue that the 
latter explanation is correct. Their argument is based 
on correlations within call sequences on links, that is, 
between pairs of individuals. The existence of such cor¬ 
relations at the nodal level was shown in [24] by con¬ 
sidering the distribution of the numbers of events E in 
bursty periods^ that is, trains of calls where each suc¬ 
cessive call takes place within some St time units. The 
distribution of event numbers P(P) was seen to follow a 
power law in the original data, P(P) cx: whereas for 

randomised reference data with shuffled inter-call times 
on links Pref{E) oc . This shuffled reference corre¬ 
sponds to a case where the inter-call time distribution of 
links is the same as in the original data, but all correla¬ 
tions have been removed. Since P(P) ^ Pref{E), there 
are correlations within the call trains. It should be noted 
that there are other suggested explanations; it has been 
argued based on e-mail data that burstiness results from 
the interplay of Poissonian processes and circadian and 
weekly patterns [25] . Ref. m claims that this is not the 
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case, based on a procedure that removes the effects of 
such patterns from CDR data. 

The underlying network structure can have dramatic 
effects on dynamics taking place on networks, and this 
is also true for temporal networks and their inhomo¬ 
geneities. Especially, the effect of burstiness on spreading 
processes on temporal networks has been a hot research 
topic lately. The studied spreading processes include sim¬ 
ple, deterministic Susceptible-Infectious (SI) spreading 
where contact events always transmit an ’’infection” from 
infectious to susceptible individuals, as well as more com¬ 
plex processes, such as threshold dynamics. Simulations 
of such processes on top of empirical contact sequences 
have shown that non-Poissonian inter-event times have 
effects on spreading dynamics in email networks nanH], 
call networks [laiHiiTiiiH], contact networks [29], and 
in various temporal network models [301EB. For tem¬ 
poral networks of mobile telephone calls, the current un¬ 
derstanding is that burstiness slows down network-wide 
spreading as compared to a reference case of Poissonian 
inter-call times naETj. However, it may also speed up 
the very early stages of spreading dynamics. The slowing- 
down because of burstiness has a simple explanation: 
high variance of inter-call times increases the expected 
waiting times on links. This is the classical waiting time 
paradox. Another way of viewing the effect of burstiness 
is to consider the latencies of temporal paths consisting 
of time-respecting sequences of calls [32| : temporal paths 
take longer to traverse when call sequences are bursty, 
and deterministic SI spreading by definition follows the 
fastest temporal paths. However, the general picture of 
the effects of burstiness is still far from complete; there 
are conflicting results and special cases [29l |3T] . 


B. There are temporally correlated call patterns 

Moving beyond individual nodes and links towards 
larger network neighbourhoods, it is natural to expect 
that the timings of calls should reflect social behaviour 
in groups. Here, at the smallest level, one would ex¬ 
pect to see timing correlations between calls to and from 
one individual’s acquaintances, i.e. between calls on ad¬ 
jacent links. This is indeed the case - such correlations 
are the reason for the non-scaling region in the distri¬ 
bution of inter-call times (Fig. [121 US])- The lack 
of data collapse indicates a time scale measured in real 
units of time instead of group averages. The time scale 
in question is ^20-30 seconds and it corresponds to the 
typical time it takes to return or forward a call (get a 
call and then call someone else). The peak around 20- 
30 seconds is clearly visible in the shapes of triggered 
time correlation functions in Ref. [28] (’’density of pre¬ 
ceding events”). Correlated timings of calls around in¬ 
dividuals play a major role in the dynamics of threshold 
processes simulated on mobile call networks [28], and a 
lesser role in the dynamics of SI spreading [27] . This has 
been seen with the help of temporal reference models 
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FIG. 4. Counts of triangular 3-call motifs in time-stamped 
mobile phone call data. After [36] . 


where link-link correlations have been removed. For SIR 
(Susceptible-Infectious-Recovered) spreading with small 
to moderate transmission probability, such correlations 
facilitate spreading and cascades are larger than for a 
Poissonian reference m- 

It is natural to assume that often, there is some 
(causal) connection between subsequent calls involving 
the same individual, given that the calls follow one an¬ 
other within some short time difference At. Such a 
connection may be related to information transmission 
and forwarding, or received information triggering fur¬ 
ther calls (’’you should call Mum.”). Then, one may 
group subsequent calls into sets with the hope of find¬ 
ing patterns of interest. In Ref. [33], Kovanen et al in¬ 
troduced the concept of temporal subgraphs as a way of 
achieving this grouping. The concept of temporal sub¬ 
graphs builds on the notions of At-adjacency and At- 
connectivity. Two calls are At-adjacent if they share a 
node and take place within At time units, and two calls 
are At-connected if one can trace a path of At adjacency 
between them. Temporal subgraphs are then defined as 
sets of At-connected calls. 

Analogously to the concept of motifs IMIES] in static 
networks, one may extract all different temporal sub¬ 
graphs from a call network (given a choice of At) 
and group them into equivalence classes, temporal mo- 
tifs ISSIESIISI]. Then, the numbers of subgraphs in dif¬ 
ferent classes provide information on temporal processes 
in the network - e.g., ubiquitous chains and stars may 
refiect transmission and spreading of information. For 
temporal motifs, the equivalence classes are best defined 
on the basis of the order of calls: a temporal subgraph 
where A calls B who calls C should be equivalent to a 
subgraph where D calls E who calls F at some different 
point in time. Then, class equivalence can be addressed 
by mapping the subgraphs to directed, coloured graphs 
and then applying graph isomorphism techniques [33] . 

For mobile call networks, temporal motif analysis of 
all 3-event motifs reveals that the most common ones 
refiect burstiness (three calls from A to B, A calling B 
who shortly thereafter calls A back , etc) [33]. When 
counts of triangular motifs are analysed (see Fig. [^, it 
is seen that the most common ones can have a direct 
causal explanation (e.g. A calls B and C, and then C 
calls B), whereas the least common ones appear to have 


5 


arisen by random chance (A calls B, C calls B, A calls C). 
More detailed results are obtained when properties of the 
callers (gender, age) or links (intra- or inter-community) 
are taken into account: nodes in common temporal mo¬ 
tifs tend to have similar properties {temporal homophily)^ 
female motifs are different from male motifs (chains and 
stars vs. ”ping-pong”), and motifs within communities 
are more complex than those between m 

C. There are daily and weekly rhythms 

In addition to the micro-level correlations between tim¬ 
ings of calls discussed above, there are rhythms whose 
origins are in the behavioural patterns of individuals but 
that are also clearly apparent at the aggregate level. 
In general, human activity follows a circadian rhythm, 
phase-locked to the day-night cycle, and this is also ev¬ 
ident in call activity (see, e.g., [151? ])• Here, an inter¬ 
esting application is to consider the geospatial aspects 
of circadian rhythms, and measure call frequencies at 
different times of the day at different (tower) locations, 
which helps to understand spatiotemporal hotspots and 
the ’’rhythms of cities” [38]. Besides daily rhythms, there 
are also differences between weekdays - not only in terms 
of call activity levels, but also in terms of who is being 
called m- weekends are different from weekdays. 

IV. DYNAMICS OF TIES 

A. Mechanisms for tie creation and destruction 

Communication events are constituents of human re¬ 
lationships. At longer time scales (typically months), 
some social relationships are formed while others decay 
in time. The dynamics of ties is not random; several 
factors moderate their dynamics. First, there are mean¬ 
ingful social mechanisms behind link dynamics that have 
to do both with intention of individuals and stochastic el¬ 
ements. Sociological studies have revealed that many so¬ 
cial mechanisms such as triadic closure (embeddedness), 
homophily, reciprocity, geographical proximity, or pref¬ 
erential attachment trigger the process of link formation 
(and conversely link decay) [39]. For example, it is very 
likely that a link is formed between two persons who 
already share a wealth of common friends, or who hap¬ 
pen to live near one another. In fact, these well-known 
mechanisms are behind most of the friend recommenda¬ 
tion algorithms in electronic social networks m- Second, 
the amount of social interactions that humans can han¬ 
dle is constrained: time, socio-economical status, and/or 
cognitive capacity do limit the number of social connec¬ 
tions we can maintain in time [lusai- This impacts 
the way how humans distribute communication between 
their connections, and also how humans balance the pro¬ 
cess of creating new links with that of destroying old 
ones. 
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FIG. 5. Entanglement between bursty dynamics of ties and 
their format ion/decay. Timelines of calls within 5 given ties. 
i ^ j. The gray areas show two different observation periods. 
If the observation period is very small compared with typical 
inter-event time Stij (e.g. Q 2 ) we might underestimate the 
ties present. For longer observation periods Qi we need large 
longitudinal databases to assess that links are created (e.g. 
link b) or destroyed (e.g. link a). 


Mechanisms such as those discussed above should leave 
traces in empirical data on the dynamics of ties, in the 
shape of deviations from randomness or invariant fea¬ 
tures, and many studies in the recent years have at¬ 
tempted to uncover the salient statistical properties of 
tie dynamics. This has only been possible recently, be¬ 
cause the typical time scale of tie dynamics (months) 
requires longitudinal databases with long periods of ob¬ 
servation (years), see Figure On top of that, while 
there is an explicit friending procedure on some commu¬ 
nications platforms (e.g. Facebook), in most situations 
such as with CDRs tie formation or decay has to be es¬ 
timated from the initiation and termination of activity 
within the tie. As we have shown before, communication 
events are bursty, and a long inactivity period can be 
mistaken as an absence of the tie. This could be allevi¬ 
ated by using longer and different observation windows 
(typically around 6 months) [d^ 05], since short time 
windows can underestimate the main structural proper¬ 
ties of the network while overestimating tie dynamics due 
to the bursty activity within links [1306]. 

B. Dynamics of tie creation and destruction 

In Ref. 04], Miritello et al studied the dynamics of 
formation and decay of individual links, using a large lon¬ 
gitudinal database (19 months) of mobile phone records. 
They found that as other human activities, link forma¬ 
tion/decay events also happen in bursts, e.g. rapid suc¬ 
cessive creation/destruction events of ties are separated 
by longer periods of inactivity. Despite this bursty be¬ 
havior, a strong cutoff in the distribution of inter-event 
times was found, suggesting that there is a typical time 
scale of tie dynamics: in Ref. 04] it was observed that on 
average around one tie is created/destroyed per month in 
human communication. The existence of this time scale 
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even at the individual level implies that social neighbour¬ 
hoods change linearly in time. However, as mentioned 
before, not all links are equally likely to decay. Miritello 
et al. m found that after 6 months of activity the persis¬ 
tence of social neighborhoods (i.e. the fraction of an indi¬ 
vidual’s links that remain active during those 6 months) 
was around 75%, compared to the expected 50% of a null 
model in which each individual’s link is equally probable 
to disappear. Similar long-term persistence of some links 
was found in Burt’s study of 4 years of relationships of 
individuals in a financial organization m- 

Information diffusion is also affected by the dynamics 
of tie creation/destruction. Several works have found 
that in general tie dynamics alone (without consider¬ 
ing the bursty nature of events within each tie) slows 
down the propagation of information [iSl [49] when com¬ 
pared with null models in which links are usually taken as 
static. This is due to the unreal assumption that all links 
observed in a period of time are capable of transmitting 
information throughout that observation time. The high 
link turnover observed in human communication suggests 
that the static picture of human ties overestimates the 
connectivity potential of individuals. 


C. Social strategies and persistent patterns 

Even though ties are continually created and destroyed 
by individuals at a fast (monthly) pace (their social ac¬ 
tivity), it was found in [44] than the rates of creation 
and destruction are similar for each individual. This im¬ 
plies that the number of active ties at a given instant 
(the social capacity of an individual) is almost constant 
in time. Different combinations of capacity and activity 
were found that define for each individual a dynamical 
strategy of communication: while social explorers have 
large levels of activity compared to their communication 
capacity resulting in a fast turnover of their neighbor¬ 
hood, social keepers activate/deactivate a smaller num¬ 
ber of connections compared to their capacity and their 
social neighborhood is mostly stable (see Fig. [^. Those 
dynamical communication strategies depend on the age 
and gender of individuals, with both capacity and activ¬ 
ity decreasing as a function of age and being larger for 
men than women. Finally it was found that there was a 
significant assortativity of social strategies, meaning that 
social explorers/keepers tend to gather. These findings 
render a dynamical picture of the network with very dif¬ 
ferent rhythms of evolution: highly static areas of social 
keepers live together with extremely volatile groups of 
social explorers. 

Social strategies also have an impact on an individ¬ 
ual’s capacity to access information that is being propa¬ 
gated in a network. Using similar SI simulations as men¬ 
tioned previously, Miritello ct al. [44] found that (for 
a fixed number of different contacts), social keepers re¬ 
ceived (i.e., became infected with) the information faster 
than social explorers. This result suggests that the infor¬ 


mation access benefits of diverse ties of social explorers 
are outweighted by their short time lifespan, resulting 
in a net delay in access to information from individuals 
activating them. 

In Ref. [19], Saramaki et al. have pointed out another 
feature of individuals’ networks that remains invariant 
even though there is network turnover - the frequency- 
rank relationship of numbers of calls to others, called 
their social signature. This means that for a given in¬ 
dividual, the fraction of calls targeted to each acquain¬ 
tance depends on how highly they rank in that individ¬ 
ual’s network in terms of call numbers, not their identity. 
Hence, each individual has a characteristic social signa¬ 
ture - e.g., placing a higher fraction of calls to the top 
2 acquaintances, or sharing calls more evenly among ev¬ 
eryone. These results were obtained using 18 months of 
data for 24 students who finished high school and went to 
university or work, which guaranteed rather high levels 
of turnover in their social networks. 

Ref. [50] looked at egocentric network evolution from 
a different point of view - that of new communication 
events being associated with existing or newly appearing 
ties. They found a universal formula for the probability 
of observing new versus existing ties. 


V. COMMUNITY AND NETWORK 
EVOLUTION 

At even longer time scales - years - we find that some 
parts of the network or even the network itself change 
dramatically. Palla et al. HO] found with CDR and 
other data that social groups or communities within net¬ 
works have their own dynamics. Concurrent tie forma¬ 
tion/decay events inside and around those communities 
give rise to growth, contraction, merging, splitting, and 
birth or death of communities. Interestingly, Palla et al. 
also found that while larger communities are on average 
older, they also have higher rates of change. Thus, large 
communities survive because of a continuous turnover of 
their members. For example, there is a sustained flow of 
individuals joining communities and leaving them. Back- 
strom et al. found with data on online and co-authorship 
networks that membership is contagious: the probability 
to join a community depends on the number of friends 
previously in the community [51]. The typical scale of 
community dynamics depends on their size, although 
there are merging or splitting events in communities that 
happen in short periods of time (two weeks in [TO] ), sug¬ 
gesting that community evolution resembles the punctu¬ 
ated equilibrium of biological species. 

In order to discuss communication network dynamics 
on even larger scales, we have to take a small detour 
from the focal topic of call networks and enter the online 
world. The recent explosion of online social networks has 
not only allowed to study how successful, massive net¬ 
works grow in time [52], but also to perform an autopsy 
on the late stages of unsuccessful ones [53] . By studying 
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FIG. 6. Snapshots of the active links (black) in the neighborhood of two different individuals (red symbols) at 5 equally-spaced 
times during 6 months. Inactive links at a given instant are in gray. A) user behaves according to the explorer strategy, while 
B) follows the keeper strategy. Link width is proportional to the number of calls. Note that both users present a similar 
frequency-rank relationship (signature) along the 6 months 


the growth of networks like Linkedin, Decilious or Flickr, 
Leskovec et al [52] found that nodes and edges do not 
arrive linearly in time. Rather, their growth dynamics ac¬ 
celerates after the first year, probably due to non-linear 
social effects in the adoption contagion or external mass 
media influence as Toole et al found for Twitter [54] . De¬ 
spite this non-linear effect, tie dynamics happen mostly 
following the well-known processes of preferential attach¬ 
ment, triadic closure [52] or geographical homophily [54] . 

In addition to network growth, there is also network 
decline, or at least membership turnover. When it comes 
to social networks built from CDR data provided by a 
single carrier, network turnover is mostly related to some 
users eventually deciding to abandon the carrier and 
switch to another carrier. This is known as churn. The 
rate at which client churn happens is large (at around 
2% per month for wireless carriers in the US). This large 
turnover of the networks can be analyzed using CDRs: 
for example Dasgupta et al. found that as with commu¬ 
nity evolution, the decision to leave the carrier is highly 
correlated to the number of friends that have previously 
left the carrier m- A similar form of social contagion is 
behind adoption of products or services in networks: us¬ 
ing 2 years of CDR data, Sunds0y et al. [56] found that 
product adoption spreads through the social network of 
clients. 

The decline of entire networks has been studied with 
data on some online social networks that shrunk after 
their growth phase. Garcia et al. [53] studied two years 
of decline of networks like Friendster to investigate the 
causes and mechanisms behind their failure. The found 
that as with mobile call networks, individual decisions 
regarding participating in or leaving a community or net¬ 


work are, to a large extent, determined by the number 
of one’s friends in the social network and their own en¬ 
gagement with the community/network. Thus, a fraction 
of one’s friends leaving the network/community can trig¬ 
ger one to leave, resulting in further cascades of leaving 
events and eventually in the network shrinking and finally 
ceasing to be. This cascading process accelerates in time 
and thus network decay can eventually be really fast. For 
example, the Friendster network shrunk from 60 million 
users to 10 million users in a year probably because of a 
cascading process (but also because of mass media and 
competition with other online social networks). 

Finally, it is of interest to note that although mobile 
call networks are shaped by a number of processes from 
dynamics of egocentric networks and communities to cus¬ 
tomer churn, key network-level characteristics such as 
connectivity and tie strength distributions appear to re¬ 
main stationary over long periods of time m- That is, 
the details of the networks change, but the big picture 
does not. 


VI. FUTURE OUTLOOK 

As seen above, mobile telephone call records have al¬ 
lowed us to better understand human communication dy¬ 
namics, and through that, the dynamics of social net¬ 
works. Where, then, is this field heading? Interest in mo¬ 
bile telephone data still certainly growing, as seen in the 
success of e.g. the Netmob conference on mobile phone 
data set analysis {www.netmob.org), and there are plenty 
of open issues to be addressed. At the same time, there 







is an ever-expanding diversity of communication chan¬ 
nels which necessitates approaches that do not rely on 
a single source of data. It may even be that the golden 
age of CDR-based research is slowly coming to its end, 
as the younger generations adopt new channels of com¬ 
munication even for voice (Skype, voice over IP). Mobile 
communication via such channels is only seen as data 
traffic and details such as recipients of messages are not 
recorded by the mobile telecom. With a multitude of 
channels operated by different companies, data collection 
on massive scales becomes difficult or impossible. This 
may necessitate collecting research data via smartphone 
apps from consenting volunteers; although the numbers 
of participants will necessarily be smaller, this may be 
compensated by an increase in data quality and coverage 
of multiple channels. 

In the following, we will attempt to identify some 
emerging themes and trends. 


A. Experiments: Prom big data to deep data 

The advantage of using CDRs, extracted from mo¬ 
bile operators’ billing systems, is the sheer size of data, 
both in terms of numbers of users and in terms of call 
events. However, at the same time, such data is necessar¬ 
ily shallow m- Because of privacy reasons, information 
on phone users is very limited {e.g. age, gender) or not 
available at all. Furthermore, mobile telephone calls rep¬ 
resent only one channel in an ever-expanding multitude 
of electronic communication channels. Yet data originat¬ 
ing in mobile telephone operator billing system does not 
contain information on any other channels, with the ex¬ 
ception of text messages (whose use has already declined 
in the younger generations). 

There is only one practical way of sorting out the above 
difficulties, and that is collecting research data already 
at the user end, for example with a smartphone app de¬ 
signed for the purpose. As mentioned above, this may 
also be the only viable option in the long term because 
of the increasing diversity of communication channels re¬ 
sults in a lack of coverage by CDRs. Then, instead of 
using anonymised large-scale data, one needs to go for 
volunteer users. In other words, the data collection phase 
becomes its own project - an experiment designed by re¬ 
searchers. This necessarily limits the number of studied 
individuals, as it is hard to scale up any experiment to 
the level of entire nations that CDRs cover. Participant 
retention is another problem, especially for longitudinal 
experiments with intended time spans of years instead 
of months. In the earliest experiments that combine call 
records with other types of data (such as GPS positioning 
and Bluetooth proximity) [58]-[6T] , the numbers of partic¬ 
ipants have been of the order of ^ 100. An effort that is 
larger by one decade - the Copenhagen Networks Study 
with N ^ 1000 - is currently approaching the end of its 
data collection phase m- 

App-based collection of data allows recording a num¬ 


ber of data streams from GPS positioning to usage of 
other apps and communication channels. On top of that, 
it is possible to apply the traditional method of social sci¬ 
ences: actually ask the users what they are doing, why, 
and how do they feel about it, via pop-up surveys. These, 
together with psychological profiling {e.g. standard ques¬ 
tionnaires for personality traits, see EH) provides a far 
more detailed picture on the users than what can be ob¬ 
tained from mobile telephone operators. Additionally, 
surveys can provide valuable information on the nature 
and closeness of the social ties captured by electronic 
communication, since different types of ties play differ¬ 
ent roles in network structure (see, e.g.^ IS2])- Given 

data sets with temporal information, ground truth, and 
enough statistical power, it might even be possible to 
associate features of call time series with the nature of 
social ties, something that could then be applied to un¬ 
lock features of larger data sets. 


B. Prom aggregates to individuals 

In the early days of social network analysis with large 
databases, focusing on structural properties of networks 
and disregarding details such as possible differences be¬ 
tween individuals was the norm (except perhaps for is¬ 
sues such as broad connectivity distributions of individ¬ 
uals). Likewise, the social network modelling paradigm 
has mainly been that of social atoms: in a typical agent- 
based model, each node follows exactly the same rules as 
everyone else, and the aim is to see whether such mini- 
malistic assumptions can already explain empirically ob¬ 
served features. However, it is evident that much impor¬ 
tant information is lost when disregarding individual dif¬ 
ferences and relying on system-level summary statistics 
on network structure and dynamics. In the worst case, 
this can lead to so-called ecological fallacy [63], where sta¬ 
tistical dependencies seen at the system level are falsely 
attributed to the individuals comprising the system too. 
The opposite is also possible: much interesting and im¬ 
portant variation may be hidden behind flat system-level 
averages. It should be noted that statistical physicists 
are especially vulnerable to this problem, being used to 
system-level statistics describing the behaviour of large 
numbers of identical elements. 

Moving beyond the aggregate level requires more de¬ 
tailed information than knowledge of network structure 
alone (which is probably why so many studies have re¬ 
mained there). In addition to harder-to-collect experi¬ 
mental data (see above), when it comes to CDRs, such 
information can be available in the time domain. Even 
when the source data contains no direct information on 
each individual’s attributes, their behavioural patterns 
manifested in temporal event sequences and link dynam¬ 
ics may allow distinguishing between different types of 
individuals, or inferring some personal traits and fea¬ 
tures. The discussion on social strategies (keeper, ex¬ 
plorer) above is a good example of this. Known attributes 
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of individuals such as gender and age have also been 
seen to affect their temporal communication patterns in 
Ref. [37]. When call data collected in experiments is 
augmented with extra information such as surveys and 
psychological profiling, an entirely new set of possibili¬ 
ties opens up. As an example, correlations beetween ex¬ 
traversion and number of calls contacts have been found 
in the analysis of CDRs [saisiES], similarly to Face- 
book friends [66] . 

In addition to data analysis, we expect that the next 
generation of improved social network models will build 
on diversity in individual behaviour instead of identical 
agents. 

C. Data sharing 

Perhaps the biggest problematic issue related to mo¬ 
bile call datasets is that it is typically impossible to pub¬ 
licly share data. Most results on mobile call networks 
are outputs of a similar pipeline: a mobile telecom op¬ 
erator agrees to provide anonymised data to a group of 
researchers under strict non-disclosure agreements and 
under the condition that any results to be published must 
be scrutinised by the company first to avoid publishing 
commercial secrets. Anonymisation is typically achieved 
simply by replacing all phone numbers with surrogate 
keys. Since the data cannot be shared, one of the main 
principles of the scientific method, reproducibility of re¬ 
sults, is violated: the only way of checking someone’s 
results is to succeed in obtaining similar data from a 
telecom, and even then the results may differ because 
of e.g. sampling or cultural issues. 

The problem with releasing detailed call data with time 
stamps is that it is very difficult to guarantee that users 
cannot be re-identified even in anonymized data. As an 
example, it is easy to identify oneself by matching the 
time stamps found in the call log of one’s own mobile 
phone with time stamps in CDR data; this also reveals 
everyone who has been called. In fact, structural net¬ 
work information without time stamps may already be 
enough, see [67]. Inclusion of tower location data brings 
even more problems [68] , since individuals can be identi¬ 
fied from their frequented locations [69]. However, if the 
data has been collected experimentally, with consenting 
volunteers, it may be possible to release at least parts of 
it, depending on a number of issues such as agreements 
with participants and the level of anonymity in the re¬ 
leased data. 


Because of the above problems, there are very few pub¬ 
licly shared data sets - the MIT reality mining data in¬ 
cludes call logs [58], Wu et al 170] comes with three sets 
of data on anonymised, time-stamped text messages, and 
Saramaki et al m provides three sets of egocentric net¬ 
works aggregated over 6 months each. A notable excep¬ 
tion in data sharing is the different data challenges that 
have taken place in the last years, such as the Nokia Mo¬ 
bile Data Challenge by Nokia [^, the two D4D Chal¬ 
lenges by Orange or the Telecom Italia Big Data 
Challenge [73]. In these challenges, anonymised sam¬ 
ples of call and mobility data are made available for re¬ 
searchers for limited time upon request. The aim is to use 
the data for research projects that e.g. have a develop¬ 
ment dimension like in the D4D challenges (Ivory Coast 
in 2013, Senegal in 2014) or address applicability in sec¬ 
tors like energy, weather, public and private transport, 
and social network studies. Here, privacy issues have 
been addressed by a number of techniques: small sam¬ 
ples, aggregation and coarse-graining, and added noise. 
It is worth noting that although the data is available to 
any researcher who wishes to participate in the competi¬ 
tion, it still comes with a non-disclosure agreement and 
its use is limited to the competition. A remarkable ex¬ 
ception is the data by Telecom Italia, who have opened 
their challenge data for reuse [74] . 

Would it then be possible at all to share mobile call 
data, without aggregating out too many details and while 
still preserving privacy? There are no readily available 
solutions to this problem. The concept of homomorphic 
encryption (see, e.g., ESI) has been discussed in contexts 
such as cloud security. In this scheme, a limited set of 
analysis operations can be conducted on data that is al¬ 
ready encrypted. However, the viability of such a scheme 
for CDR analysis is uncertain. Another possibility might 
be not to release the data itself, but let researchers ac¬ 
cess to it through an Application Programming Interface 
(API) that allows using highest-resolution data in com¬ 
putations, but only provides aggregated results, along the 
lines of openPDS [76]. 
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